Data wranglingArtwork by @allison_horst
To:Bot Bakery
Subject:Big ship treat order
Attached:crew_food_allergies.XLSX
Hi friend,
Here’s that table of my crew’s goody preferences. 😋
Guess which one is me!
We can download the attached Excel file from: here
After you’ve downloaded the file, move the file from your Downloads/ folder to a data/ folder in your project’s main directory:
Files pane.Have an EXCEL file? The readxl package has the
function read_excel() for reading Excel data into R.
readxl:install.packages(readxl)
library(readxl)
crew_df <- read_excel("data/crew_food_prefs.XLSX")
Ope! What happened to our column names / headers?
crew_df <- read_excel("data/crew_food_prefs.XLSX", skip = 1)
glimpse(crew_df)
summary(crew_df)
filter() out the test dataThere appears to be test data included in the table. Yuk! Let’s filter it out so it doesn’t influence our results.
crew_df <- filter(crew_df, id != "test")
select() what we needThere’s also some columns in here we don’t need. Let’s drop the country people are from since that won’t help us.
crew_df <- select(crew_df, -country)
Let’s create a separate smaller favorites table with
only the id and favorite_goody columns.
favorites_df <- select(crew_df, id, favorite_goody)
Let’s create a separate allergy table with only the
id and all of the columns that contain the word
“allergy”.
allergy_df <- select(crew_df, id, contains("allergy"))
Finally, for easier reference, use select() to make the
id and days_on_ship columns the first two
columns in the table. Everything else can be in the same order.
favorites_df <- select(crew_df, id, days_on_ship, everything())
arrange() the crewLet’s use arrange() to sort the crew members so the ones
that will be on the ship the longest are on top. They’ll be the ones
munching on are goodies the longest.
crew_df <- arrange(crew_df, days_on_ship)
Ope! What happened?
crew_df <- arrange(crew_df, desc(days_on_ship))
-column_name instead of
desc(column_name)crew_df <- arrange(crew_df, -days_on_ship)
mutate() new columnsLet’s get an idea of the total meals each crew mate will be eating
baked goods. For that, we’ll need to take their
days_on_the_ship times their
meals_per_day.
# Use the * to multiply and the / to divide
crew_df <- mutate(crew_df, total_trip_meals = days_on_the_ship * meals_per_day)
Forks in the road: if_else() and case_when()if_else(days_on_ship > 7, true = “yes”, false = “no”)
if_else(days_on_ship > 7, true = “yes”, false = “no”, missing = “unknown”)
if_else(egg_allergy, 1, 0)
ifelse(egg_allergy, 1, 0)
Boil it down // Show me the summarySome like crawfish boils and others like data boils. Let’s boil down our data and find some summary statistics.